Skip to content

Conversation

@Nasf-Fan
Copy link
Contributor

@Nasf-Fan Nasf-Fan commented May 9, 2025

Otherwise, it may misguide subsequent crt_get_filtered_grp_rank_list() and regard the IV root to be as non-exist in the ranks list, then fail related IV operation with -DER_NONEXIST.

It maybe not a perfected solution for current cart IV logic. But to be some temporary option, it makes CR to be workable when some ranks dead.

Add new test case to verify such corner case.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link

github-actions bot commented May 9, 2025

Ticket title is 'DAOS checker cannot completed on Aurora after some engines excluded'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-17535

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16357/3/display/redirect

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-17535 branch from 79a3909 to 63edfe2 Compare June 17, 2025 03:08
@Nasf-Fan Nasf-Fan marked this pull request as ready for review June 18, 2025 13:40
@Nasf-Fan Nasf-Fan requested review from a team as code owners June 18, 2025 13:40
@Nasf-Fan Nasf-Fan requested a review from jgmoore-or June 18, 2025 13:40
@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16357/4/testReport/

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16357/4/execution/node/1468/log

jgmoore-or
jgmoore-or previously approved these changes Jun 30, 2025
excluded_list.rl_nr = 1;
excluded_list.rl_ranks = excluded_ranks;
excluded_ranks[0] = ivns_internal->cii_grp_priv->gp_self;
/* Perform refresh on local node */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are including self now in sync corpc then perhaps the local update here should be removed?

Copy link
Contributor Author

@Nasf-Fan Nasf-Fan Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, let me test whether that works or not.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-17535 branch 3 times, most recently from 009fec2 to c28313a Compare July 26, 2025 04:41
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16357/7/execution/node/1470/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-17535 branch from c28313a to 4e72a38 Compare July 28, 2025 04:12
@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16357/8/execution/node/1637/log

@Nasf-Fan
Copy link
Contributor Author

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16357/8/execution/node/1637/log

online_rebuild_mdtest failed for DAOS-17751, not related with the patch.

@Nasf-Fan
Copy link
Contributor Author

Ping reviewers, thanks!

Otherwise, it may misguide subsequent crt_get_filtered_grp_rank_list()
and regard the IV root to be as non-exist in the ranks list, then fail
related IV operation with -DER_NONEXIST.

It maybe not a perfected solution for current cart IV logic. But to be
some temporary option, it makes CR to be workable when some ranks dead.

Add new test case to verify such corner case.

Signed-off-by: Fan Yong <[email protected]>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-17535 branch from 4e72a38 to 45b2b2a Compare September 3, 2025 10:54
@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Sep 3, 2025

Resolve merge conflict.

@Nasf-Fan
Copy link
Contributor Author

Ping reviewers, thanks!

@Nasf-Fan
Copy link
Contributor Author

@jgmoore-or , would you please to help review the patch? Thanks!

@Nasf-Fan Nasf-Fan closed this Dec 31, 2025
@Nasf-Fan Nasf-Fan deleted the Nasf-Fan/DAOS-17535 branch December 31, 2025 07:34
@gnailzenh
Copy link
Contributor

@alexbarcelo is this PR sufficient for the problem, or it's just a workaround and requires more work?

@alexbarcelo
Copy link
Contributor

Are you pinging me? Wrong Alex?

@gnailzenh
Copy link
Contributor

@frostedcmos is this PR sufficient for the problem, or it may require more work?

@frostedcmos
Copy link
Contributor

frostedcmos commented Jan 12, 2026

@frostedcmos is this PR sufficient for the problem, or it may require more work?

from carts perspective either the current or this prs behavior is fine and is a matter of preference, but looks like this ticket DAOS-17535 is properly solved by a different daos-level PR #17329

@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Jan 12, 2026

@frostedcmos is this PR sufficient for the problem, or it may require more work?

from carts perspective either the current or this prs behavior is fine and is a matter of preference, but looks like this ticket DAOS-17535 is properly solved by a different daos-level PR #17329

yes, we do not need this pr any longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

7 participants